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In this paper, I investigate more closely the recently proposed Free Energy Monte Carlo al- 
gorithm that is devised in particular for calculations where conventional Monte Carlo simulations 
struggle with ergodicity problems. The simplest version of the proposed algorithm allows for the 
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■ determination of the entropy function of statistical systems and/or performs entropy sampling at 

sufficiently large times. I also show how this algorithm can be used to explore the system's energy 
fa ' 

CZ • space, in particular for minima. 
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| I. INTRODUCTION 

CO 

o 

t^. ' With fast-growing computer technology, Monte Carlo (MC) simulations have with much success been used to study 



various statistical systems including neural networks, problems in biology and chemistry, lattice-gauge theories and 

optimisation problems in various areas, not to mention statistical physics, to study the properties of phase transitions 
H ■ 

I and critical phenomena. 

t3 : 

Most MC simulations concentrate on importance sampling for the canonical or microcanonical Gibbs ensemble, 
introduced by Metropolis et al jjj. The thermodynamic average, O, of an observable O(x) can be estimated ||) as 
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where x T represents a configuration at time r of a system with Hamiltonian H, j$ = (kT)^ 1 is the inverse temperature 
(with Boltzmann constant fc), t a an averaging time (with t a 3> 1), and P(x) a sampling probability. If P(x) is chosen 
to be constant, very few samples contribute significantly to the sums in the above equation, and a very long time 
is required to get a reasonable estimate of O. Importance sampling comes in if one chooses P(x) as the Boltzmann 
weight exp(— (3H (x)). It is generally a good sampling algorithm, but can fail to access all the parts of the phase space 
in available computer time. Indeed, in many situations, this approach or similar ones face severe ergodicity problems 
if there exist many high barriers between all the possible lowest- (or nearly-lowest-) energy configurations, as, e.g., in 
certain Lennard- Jones systems (see, e.g., HQ) and in spin glasses (see, e.g., |3]-||), to cite only two examples from 
statistical physics. 
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To overcome these, at least for a large part, it has been suggested that it could be more efficient to reconstruct 
the Gibbs ensemble from a simulation with other ensembles (e.g., a so-called "multicanonical MC simulation") than 
to simulate it directly (see Q and references therein). One of these approaches goes under the name of the entropy 
sampling Monte Carlo method (ESMC). It works as follows. Let the probability of occurence of a configuration x with 
energy E be denoted as P(x), and the probability of occurence of a state with energy E as P(E). The term "state" 
stands here for the set of all configurations that have the same energy. They are related to each other through 

P(x) oc e~ pB , 

(2) 

P{E) oc N{E)e~' 3E = e s ^/ k -^ E , 

where I have introduced the entropy, S(E), of the state with energy E. Their number is N(E). In the Metropolis MC 
method jj]], the canonical distribution of states is obtained, along with ergodicity, by a Markovian sequence in which 
the transition probabilities, tt(x — > x') and n(x' — > x), between a pair of configurations x and x' are determined by 
the detailed balance condition 

tt(x -> x') _ exp[-/3E(x')] 



n(x' — > x) exp[— (3E{x) 



(3) 



It can be shown rigorously that, in the case of a traditional MC simulation, this condition ensures a simulation of the 
system with an equilibrium distribution which is just the Gibbs distribution fiof . The ESMC method is, however, 
based on the probability distribution of states, in which the probability of occurence of a configuration with energy 
E is proportional to the exponential of the negative entropy, 

P(x) OC e S{B(x))lk 

(4) 

P(E) oc N(E)e~ s ^/ k . 

In a ESMC simulation the probability of occurence of a configuration with energy E is therefore anti-proportional to 
the number of configurations with that energy. In this way, the probabilities of occurence of all states equal the same 
constant. An MC algorithm that does the job is one that is based on the detailed balance condition 

ir{x^x') exp[-S(E(x'))/k] 



tt(x' -» x) cxp[-S(E(x))/k] 



(5) 



In all other aspects, the formalism of the ESMC procedure follows then the usual Metropolis procedure. It is therewith 
easy to show (using the methods exhibited, e.g., in JlIJ) that the ESMC algorithm simulates the system in such a 
way that all states occur with the same probability. Hence, the algorithm provides for a (one-dimensional) random 
walk through the system's energy space. 

In this spirit, Monte Carlo sampling with respect to unconventional ensembles has received some attention (see, 
e.g., || and references therein) in recent years. In the "multicanonical ensemble" approach [|l2|]l3| , |l"8f| , one samples 
configurations such that the exact reconstruction of canonical expectation values becomes feasible for a desired tem- 
perature range. Multicanonical and related sampling has allowed considerable gains in situations with "supercritical" 
slowing down, such as 
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(i) first order phase transitions |l2Jl3,E2], (for a recent review see, e.g., [p4|), 



(ii) systems with conflicting constraints, such as spin glasses |15| , [17|]3l[|32f or proteins |2 



The reconstruction of canonical expectation values requires knowledge of the entropy values of the/an important part 
of the energy range (see equation (|l|)), but leaves innovative freedom concerning the optimal shape Considerable 
practical experience exists only for algorithms where one samples such that: 

(a) The probability density is flat in a desired energy range P(E) = const. 

(b) Each configuration of fixed energy E appears with the same likelihood. 

It should be noted that condition (b) is non-trivial. A simple algorithm |3S| ] exists to achieve (a), but which gives 
up (b). Exact connection to the canonical ensemble is then lost. Such algorithms are interesting particularly for 
hard optimisation problems, but may be unsuitable for canonical statistical physics. The present paper focuses on 
achieving (a) and (b). 

To achieve a flat energy distribution, the appropriate unnormalised weight factor in equation (Q) is S(E). However, 
before simulations, the entropy function S(E) is usually not known. Otherwise we would have solved the problem in 
the first place. Presumably, reluctance about simulations with an a-priori unknown weight factor is the main reason 
why the earlier umbrella sampling [Q never became popular in statistical physics. 

In the more recent papers [9-34] it has been suggested to overcome this loophole by simulating with approximate 
entropy values, obtained by guessing or a short Gibbs run, and then successively simulating with ever better estimates 
of the real entropy. For, if an incorrect entropy function, J{E), is used in equation (Q) instead of S(E), then the states 
with J(E) < S(E) will have a larger probability to occur than in an exact ESMC simulation, and, therefore, will be 
sampled more frequently; similarly, those states with S(E) < J(E) will have a smaller one and will be sampled less 
frequently. Because the entropy function, S(E), is proportional to the logarithm of the number of configurations in the 
corresponding states, which in turn is proportional to the number of visits to the states expected by the simulation, 
an iterative process in which runs with ever better J n (E) are construed can be constructed. In practice, one simulates 
with J\{E) for an "appropriately" long time, then J2(E) is constructed, one simulates with J2{E) for some time, etc, 
and one hopes, supported by intuition, that the J n {E) gradually approach the exact S(E). This assumption has, 
however, up to now never been shown to hold rigorously, nor is there anything known on convergence properties. 

Nevertheless, for first order phase transitions in non-random systems, like ferromagnets, the problem of the a-priori 
unknown weight factor in the above algorithms is rather elegantly overcome by means of finite size scaling methods 



J_2|,y_3 22 2jj,|25|,|34j . A sufficiently accurate estimate is obtained by extrapolation from the already simulated smaller 
lattices. The smallest lattices allow still for efficient canonical simulations. For systems with conflicting constraints 
the situation is less satisfactory. Considerable attention "by hand" may be needed. 

In this paper, I investigate more closely the recently proposed Free Energy Monte Carlo algorithm [pT| from which 
the entropy can be obtained in the large-time limit without attention "by hand" . I have applied the algorithm to 
the infinite-range, the two-dimensional and the three-dimensional ferromagnet. Where possible, I have compared the 
results to the values that are known exactly. One further application of the algorithm is the surmounting of energy 
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barriers and/or a thorough exploration of the system's energy space. In this paper, I concentrate on magnetic systems, 
which have a discrete energy spectrum. However, the algorithm is easily extended to other systems (whose energy 
spectrum would have to be discretised for the use of a computer anyway). 

In the next section, I describe the algorithm and show that, under certain conditions, the algorithm converges 
indeed towards the correct entropy distribution. Numerical results are exhibited in the third section. I conclude 
finally in the last section with some outlooks and further comments concerning the application of this algorithm to 
systems other than magnets. 



I will now specify the algorithm which achieves the goal of obtaining the entropy function in the large-time limit. 
For definitencss, I will formulate the algorithm for Ising spin systems, containing a total number of N spins. 

I will enumerate the different states of the system by their energy, going from smallest to largest in value. Let 
E(m) denote the energy of the state with label m, say m = 0, 1, . . . , M — 1 (so that TV is the total number of energy 
levels). Let furthermore S(m;t) be the (estimated) entropy of the state with label m, at time t. At time t — 0, I 
initialise S(m; 0) = for all m. We will see below that only entropy differences matter in the algorithm so that the 
initialisation constant can in principle be chosen arbitrarily. The initialisation to zero is, however, preferable in the 
actual implementation of the algorithm on a computer, as mentioned below. 

The Free Energy Monte Carlo (FEMC) algorithm then works as follows. Let us assume that, at time t, the system 
is in configuration x, with energy E(x), i.e., with label m(x). Then, go through the following steps at time t + 1: 

1. Select one spin index i for which the spin Sj is considered for flipping (s, — > — Sj). 

2. Calculate the transition probability 



to pass from configuration x to configuration x' which is obtained from x by effectuating the considered spin 
flip. [In this present form the algorithm is still rather an "entropy Monte Carlo" than a "free energy Monte 
Carlo" algorithm; I will dwell on the full version of the latter below.] 

3. Draw a random number, r, uniformly distributed between zero and one. 

4. If r < -k(x — > x';t+l), flip the spin, otherwise do not flip it. In any case, the configuration of spins obtained at 
the end of step 4. is counted as the "new configuration", x updato . 



II. THE FREE ENERGY MONTE CARLO ALGORITHM 



A. The algorithm 




(6) 
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5. Now update the values of the entropy (fi = 0, 1, . . 

S(/*;t+l) := 



.,7V- 1): 
S(/i;t)+e(t)<L 



(7) 



where e(t) is a pre-chosen positive function which is sufficiently small in the large-time limit, m(x) the label of 
energy of the configuration x, and S denotes the Kronecker symbol. 

6. Go to 1. or end. 

Let us just note here that by the choice of n(x — > x'\ t + 1) as above, I ensured that the simulation verifies a detailed 
balance condition "locally", i.e., at every time step. Furthermore, in the proof of the convergence of the algorithm 
below, I will use a different update rule for the entropy values, namely, 

/ S(m(x)-t)l e(t+l) _ -n\ 

Sfa t + l) = S(/x; t) + e(t + l)^, m( x) - log f 1 + ^ " ) • ( 8 ) 

In this latter case, I shall also assume that the sum over m of exp(S(m; t)) is normalised to equal the total number of 
configurations, 2 N , at time t = 0. The choice of the update rule, equation (||), then ensures that this normalisation 
holds for all times, t. If e(t) is chosen to be small, then S(m;t) can take on quasi-continuously values between — oo 
and AT log 2, i.e., S(m;t) G [— oo, iVlog2] and the corresponding (estimated) number of configurations is bounded to 
values between and 2 N . The interchanging of the less calculation intensive update rule, equation (^), with equation 
(^) is possible because only differences in entropy enter into the calculation of the transition probability. After every 
time step, one should imagine, using equation (0), that the "zero" (baseline of the entropy values) has been shifted 
upwards. The entropy may then be obtained as a time average over instantaneous entropy values, and normalising 
these values with the help of the total number of configurations, 2^. The only problem one has to cope with, when 
using the original algorithm, is that one may have to substract the "baseline" from all entropy values every so often 
to avoid overflows or numerical inaccuracies when substracting large numbers of equal order of magnitude from each 
other. 

The above version of the algorithm does not deserve to be called an FEMC algorithm just yet, as only the entropy 
enters. However, if one wants to detect the minima (or maxima) in the energy space, it may be useful to change the 
transition probability of the algorithm to read: 

*(* ^ x '-,t + 1) : = \ (i - (9) 

where now the transition probability does not depend solely on (instantaneous) entropy differences, but on (instanta- 
neous) free energy differences, (3 being the "inverse temperature" as usual. If is large, the temperature is small, and 
the system stays preferably in configurations with low energy (if one were to take (3 small or even negative (!) one 
stays obviously preferably in configurations with high energy). This can be illustrated in particular at the beginning 
of the algorithm, when the entropy differences are zero, and one performs a gradient descent algorithm towards a local 
or global minimum out of which one is then taken by a gradual increase in (instantaneous) entropy differences. I will 
dwell on this aspect further below. With this replacement of the transition probability, the Monte Carlo algorithm can 
now truly be called Free Energy Monte Carlo (FEMC) algorithm . In the next subsection, I will, however, consider 
again the j3 = case only, for simplicity. 
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B. The master equation and convergence properties 



I will now write down a master equation for the probability, P(S,m;t), to be at time t in a state with energy 
E(m) and an (estimated) entropy function, S — S(n) (/x = 0, 1, . . . , Af — 1). By introducing a dependence on 5", 
we will avoid a time dependence in the transition probabilities in the master equation below. Furthermore, in this 
notation (and using the version of the algorithm with equation (|J) where the estimated values for the number of 
configurations per energy is bounded), the fact that the convergence of the master equation to a (unique) limit- 
distribution is ensured mathematically rigorously is easily noticed (using the methods of [To| ) . In the following, I will 
also write ir(S, m ± 1 — > to) for the transition probability of moving from a configuration with energy E(m ± 1) to one 
with energy E(m), given that at that time, the estimated entropy function is S. For simplicity, I will consider the 
specific case of the infinite-range ferromagnet in which there are only transitions between configurations where the 
corresponding states are "nearest-neighbours" in energy space. Furthermore, I will take e to be small, but constant. 
The generalisation to other systems is straightforward. 



1. The infinite-range ferromagnet 

I consider an infinite-range Ising ferromagnet consisting of N spins, Sj (i = 1,2, ... ,N), connected by bonds of 
strength J (I will set J = 1 in what follows for simplicity). The system's Hamiltonian, Hn, therefore reads 

N N-l 

As usual, the possible values are Sj = ±1. I will say that a spin, Si, points upwards if Si = +1 and downwards 
otherwise. The number, to, of upwards pointing spins is 

N 



= + . (u) 



TO 

^ 2 

i=i 

I will enumerate the different states of the system by their energy, going from smallest to largest in value. By 
symmetry, all configurations with to or N — to upwards pointing spins have the same energy 

= -A^-l) + 2TO(iV-TO) 
V ' 2N V ' 

The total number of energy levels, Af, is therefore 

N, riV + r 



(13) 

Here the (Gauss) brackets, [x], indicate the smallest integer equal or larger than x, as usual. So, the variable to in 
equation ( [l2| ) runs from to Af — 1. It is finally to note that, to every energy level, E(m), corresponds an entropy, 
S(m), that is easily calculated for the infinite-range Ising ferromagnet. By symmetry, the number of configurations, 



G 



u m , with energy E{m) is twice the Bernoulli number obtained by selecting m spins out of the N, if m < TV, and 
equals the Bernoulli number for m = Af, i.e., 



(1 + <W,JV+l) 



, if m < N , 



, if m = N . 



(14) 



The entropy values are therefrom obtained by taking the natural logarithm (I set k — 1 here and from now on for 
simplicity). 



2. The master equation 

The probability distribution P(S,m;t) (where m = 0, 1, . . . ,7V — 1) evolves in one time step At (chosen here to 
equal 1) as follows 

P(S,m;t + At) = jv ^ 1+1 ■K{S^ m - 1 \m- 1 -> m)P{S^ 1 \m - \-t)At 
+ 2i±± 7r(5("»+i) , m + 1 -> m)P(5( m+1 ) , m + 1; t)At 

-f7r(S,m-fm-l)P(S,m;t)At ( 15 ) 



■ w(S, m^m+ 1)P(S, m; t)At 



N 

+ P(S,m;t) 

The first two terms come from the probability of moving between time t and t + At into the state with energy E{m) 
with estimated entropy function S and the other ones from the probability of not leaving it. The functions 5 l ( m±1 ) 
differ from S by the values obtained from applying the algorithm at time t to a configuration with label m ± 1 to the 
function 5"( m±1 ) and getting S, i.e., 

5( M ) = ^ m±1 )( M ) + 6^ m±1 -log( 1 + ^ M , (/i = 0,l,...,JV-l). (16) 

The factors in front of the transition probabilities reflect the number of spins which can be flipped in the configuration 
at time t to obtain a configuration with energy E(m). More generally, they reflect they ratio of the respective volume 
fraction of the total configuration space. It is tacitly understood that for m = and m = TV — 1 the appropriate 
expressions in the equation above are zero. 

To see whether the algorithm is really providing convergence towards the correct entropy function, I define yet 
another probability distribution, the probability P(S; t) to obtain at time t an entropy function S. This probability 
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distribution should become a delta-peak on the true entropy function if one lets the time, t, go to infinity and then 
takes the limit e — > 0. This amounts to our goal that the algorithm gives a good approximation of the true entropy 
in the infinite-time limit. Let therefore 

AT-i 

P(S;t):= ^P(S,m;t) . (17) 

m=0 



Using equation (15), the time evolution of this probability distribution can be written down, 



JV-1 

P(S;t + l) = N ~n +1 n{S {m ~ l) ,m - 1 -> m)P{S^ m - 1 \ m - 1; t) 

m—l 
JV-2 

+ E m ^-^{S {m+1 \m+ 1 -> m)P(S {m+1 \m + 

m=0 

- 7r(S,m^m-l)P(S,m;t) ( 18 ) 

m—l 

- t ^( 5 > m -> m + 1 ) p ( 5 ' m ; *) 

m— 

+P(5;i) . 

I will now consider this equation in the infinite-time limit. Further convergence properties, which are easily obtained 
analytically in the case of the infinite-range ferromagnet, shall be published elsewhere pq]. 



3. The convergence towards the true entropy function 



As mentioned in the previous subsection, the probability distribution P(S, m; t) converges, for t — > oo, to a stationary 
distribution, P oq ^(S, m). Therefore, also P(S; t) converges to a stationary distribution, P oq (S'). This distribution verifies 
the equation 

= N -™ +1 tt^-^to- 1 -> m)P oq (S( m - 1 ),m- 1) 

rn— 1 

+ ^ ^ 7r(S*( m+1 ) , m + 1 -> m)P oq (5(" l+1 ), m + 1) 

m=o n 9 s 
AT-i v ' 

- £ f 7r(S,TO^m-l)P eq (S,m) 

m—l 

- m - m + l)P oq (S, m) , 

m=0 

and we shall see in the following that after developing this equation in orders of e and then taking the e — -> 0-limit, 
the estimated entropy function coincides with the true entropy function. [Note here that the limit t — > oo and e — ► 
are not interchangeable. Taking e — > first is the limit of performing a random walk in configuration space !] 

At time t, we have the following update 

(e e -l) , m = 0,l,...,7V-l , (20) 



S(m; i + 1) = S(m; t) + e8 m . v(t) - log f 1 + 2jV (e £ - 1) 
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and I have written v(t) to indicate the energy level of the configuration at time t. As v(t), and therewith also S(m; t) 
for all m, is the result of a stochastic process, both v{t) and S(m; t) for all m are random variables. Let us therefore 
define P(v;t) to be the probability of being in a configuration with energy E(y) at time t, and develop it in the 
infinite-time limit [note again that the limit exists !] in orders of e, 



lim P{v- 1) = pW(v) + ep [1 \v) + 0{e 2 ) , e -» 



(21) 



The change in entropy AS(m; t + l,t) between time t and time t + 1 is 



AS*(m; i + 1, t) := S{m; t + 1) - S(m; t) 



>m,v{t) 



2 N 



+ 0(e 2 ) , 



(22) 



Now, we know that lim P(5, m; exists, so that we have the stationarity condition 

t— ►00 

lim < AS(m; t + 1, i) > ta = , 



(23) 



where < • • • > io denotes a time average (over time t a ), with i a — > oo, while holding ^- < 1 fixed. Indeed, this average 
in the infinite-time-limit is nothing but the same as the result of a simulation at equilibrium. 



Using equation (22) and developing the left-hand-side of equation ( |23| ) in orders of e, 
lim < S(m;t) >t a = S* (0) (m) +eS {1) (m) + 0(e 2 ) , e^O, 



(24) 



we can rewrite the stationarity condition, (|23|), as 



0= lim < AS(m;t + l,t) > ta = e lim 

t—>CO t^QC 



< $m,v{t) >t a 



< e 



>ta 



2 N 



+ 0(e 2 ), e^O, 



(25) 



and the right-hand-side must vanish order by order, in particular, 



lim < S, 



>t= lim 



< e s ^» > ta 
2^ 



(26) 



where I just have to emphasise again that the limits on both sides of equation ( |26| ) exist. Note now that the right- 
hand-side of equation (|2(]) is independent of m, and therefore a constant, and that the left-hand-side of equation j2^), 
using equation (|lj), equals 



lim < S mMt) > ta = p (0) (m) + 0(e) , 



(27) 



By normalisation, we hence have 



P(u) := t lim < P(u;t) > t = ^ + 0(e) 



e -> 



i/ = 0,l,...,jV-l 



(28) 



This means that in the infinite-time limit, all energy levels are equally probable to occur. This fact is at the origin 
of the possibility to replace equation (Q) by equation (||) in the update of the algorithm in step 5., ensuring that 
indeed on average, only the "zero" /the baseline increases. Furthermore, if the algorithm is local in the sense that 
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"neighbouring" configurations imply "neighbouring" energies, then the typical time to reach an energy level, m/, 
starting from another one, m , scales like (rrif — m ) 2 , as a general result from what is known about one-dimensional 
random walks. 

We still have to show that the equilibrium values for the (estimated) entropy function obtained through the simu- 
lation do indeed, to zeroth order in e, equal the true entropy values ot the system. Let us first note that 

P{v) = P({*i})=">»P{{si}W*i})=E(y)) (29) 

where P({si}) is the equilibrium probability for the configuration {s^}, with energy E({si}) = E{y), to occur. In 
the following, we will also write the shorthand P(y) for the probability of being in a configuration {s^}, with energy 
E({si}) — E(v), and P(y;t) when considering time-dependent properties, respectively. Furthermore, by definition, 
we have 

(30) 



2 N 

where So(y) is the true entropy of level v and therefore 



2 N 

p i v ) = ,r q f ^ ■ ( 31 ) 
The master equation ( |l8| ) for the considered algorithm reads in the considered infinite-time limit 

= lim (< N -™ +1 iv(S<> m - 1) ,m-l->m)P(m-l;t) > ta 
+ < ?H±±ir(S( m+1 \m + 1 -> m)P(m + l;t) > ta 

(32) 

- < jjTr(S, m — > m- l)P(m; t) > ta 

- < ^tt(S, m^m+ l)P(m; t) > u ) , 

i.e., using equation (^) for the transition probabilities and developing in orders of e, 

N -m+l e sW(m-i)- fl (0)( m) So (m-l) | m+1 ^'"W-^'W r -S n (m+l) 

U — JV l + e SW(m-l)-SW(m) 6 + iV 1 + e S<0)(™+l)-S(0)( m) 6 

_ m e s(°)( m) - s (°) (m -i) So(m) _ m eS (°)( m) -s(°) (m+ i) So(m) (33) 

N 1 + e s(0)(m)-S(0)( m -l) C A? 1+e sW(m)-S(»)(m t l) C 

+ 0(e), e^O, m= l,...,Af-2 , 
and similarly the equations for m = and m = M — 1. 

Defining the difference 55^°^ (m) := S^(m — 1) — S^°>(m) for m = 0, . . . , Af — 2 and using exp(5o(m)) = uj m in the 
above system of equations, solving for SS^(0), then SS^\ and so forth, it is easy to see that the unique solution of 
the system ( |33| ) is indeed 

S^im) = S (m) . (34) 
It is likewise easy to see that with this identity the first and the third, and the second and the fourth, terms in every 



equation of (33) cancel each other, respectively, by detailed balance. Hence, to lowest order in e, the estimated entropy 



function indeed coincides with the true entropy function of the system. 
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III. NUMERICAL RESULTS 



I have applied the algorithm to the infinite-range, the two-dimensional and the three-dimensional ferromagnet to 
see how well the simplest version of the algorithm (the "/3 = FEMC algorithm" ) performs on obtaining the entropy 
values of the considered systems and on overcoming energy barriers. More specifically in the case of the infinite-range 
ferromagnet, where I know the entropy (or number of configurations) exactly, I have investigated the convergence 
to the correct values of the entropy as well as the passage times ( "tunneling times" ) between the "all-spins up" and 
"all-spins down" ground states. In the case of the two- and three-dimensional ferromagnets, I have also studied these 
passage times in order to see whether the algorithm really provides for a quick "tunneling" through the energy barriers. 
A more thorough study on the role of the ratio of e and /3 in the truly FEMC algorithm on the tunneling times shall be 
published elsewhere (3qj . Last but not least, I have compared the values obtained using the algorithm with the ones 
known exactly for the case of the 4x4x4-ferromagnet. The values of e that I use are between e = 10 _1 and e = 5 ■ 10~ 4 . 
Certainly, e should tend to zero with increasing t to obtain even more accurate values of the entropy. However, the 
smaller e is already in the earlier stages of the algorithm, the longer it takes to reach the asymptotic stage. If one 
takes e too small at the start of the algorithm, one risks to never leave, during the time of the simulation, the regime 
where effectively one samples according to performing a random walk in configuration space, as the differences in the 
entropy values will stay too small in the available simulation time. A good way to measure whether one has reached 
the asymptotic regime of the algorithm yet or not is to keep track of the sampled energies: if the histogram of the 
energies, averaged over a long enough period of time, is flat, asymptotics are reached. 

Some comments on the large-e limit are also in place. For e of order 1 or larger, the transition probability (|^) of 
accepting a move becomes essentially or 1, as the differences in estimated entropy values for the different levels 
become larger than 1, hence the argument of the tanh. This has the following consequences. If, e.g., one is at time t, 
say, in a configuration of the level v{t), and the values of the estimated entropy values at the adjacent levels v(t) ± 1 
are S{v{t) — 1) < S{v(t)) < S(v(t) + 1), then moves are (essentially) accepted iff the level of the considered update, 
Update is Update = v(t) — 1. The algorithm can be viewed as putting a brick of height e at every time step onto a wall 
which is building up during the process. If the height of the wall at the adjacent place (level) whereto the move at 
time t + 1 is considered is lower, the move is accepted with probability ~ 1, if the wall is of equal height, the move is 
accepted with probability ^, and else it is (essentially always) rejected. It is easy to notice that on average the wall 
will be of equal height for all levels if one substracts a running (increasing) baseline. This leads to the observed fact 
that all energy levels occur with equal probability, albeit the fact that the weight factors in the algorithm are not 
proportional to the true entropy values (on average). 

A. The infinite-range ferromagnet 

I have considered systems which contained N — 2 n spins where n = 2,3, ... ,9. I have compared runs where 
the "tunneling times" where measured from the beginning with ones where the tunneling times of the first x ■ M 2 
[x = 1, 2, ... , 10) Monte Carlo steps (MCS; defined as usual as the time needed to update all N spins in the system) 
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were not taken into account. The (expected) experience from these runs is that the distribution of tunneling times, 
its mean (r m ) and random mean square (rms) value (T rma ) remained essentially unaltered. However, to be on the safe 
side, I have not counted the tunnelings observed during the first 10 • J\f 2 MCS in the runs whose results are displayed 
in the following. In all of the different runs, the histogram of the energies, averaged at the same time than the 
(instantaneous) entropy values, is flat, i.e., fluctuates around the mean number of sampled configurations per energy 
to within at most a small fraction of a percent for small system sizes and up to at most ±5% for the largest systems 
considered. 

The results shown in the figures have been obtained with a total number of 10 6 + 10 • N 2 MCS. I have fixed e to 
equal 10~ 3 , 10~ 2 , and 10 _1 , respectively. This implies that the values r m and r rms , the mean and the rms value of the 
tunneling times, have been obtained for e = 10 -3 from 149338 values for n = 4 to 659 values for n = 9, for e = 10~ 2 
from 148726 values for n = 4 to 1218 values for n = 9, for e = 10 _1 from 146682 values for n = 4 to 1826 values for 
n = 9. To check for statistical reliability of the data, I have also performed slightly shorter and longer runs. The error 
bars that I have got from these runs, at fixed averaging times (of the instantaneous entropy), are smaller than the 
size of the symbols in the figures. The distributions for the tunneling times, t, themselves arc, however, intrinsically 
very broad (their width is of the order of their mean) with an accordingly long tail, possibly power-law-like. This can 
be seen in figure 1, where the distribution, P(t), of the "tunneling times", r, is shown for a run of 1002560 MCS and 
e = 10~ 2 with N = 2 32 spins. In total, 14941 tunnelings have been observed during the last 10 6 MCS, therefrom 45 
events with a tunneling time larger than 10 • r m = 10 ■ 1071 MCS. 



0.025 



0.02 



Q_ 



0.01 







K 





o o 


i 




1 


1 1 1 


- 

o 



O 





o « 


» 












00 o 
<3£> 











O 
O 










4 




<>0 
















0O 



°0> 







i 




0>#0> 

o<o 

^ « A 





2000 4000 6000 8000 10000 

i 

FIG. 1. Plot of the distribution, P(r), of the "tunneling times", r, for a run of 10 6 MCS and e = 1CT 2 with N = 2 32 spins. 
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In the runs displayed in the figures of this subsection, at most 432 tunneling events with times above 5 • T m occured 
for N — 2, this number rapidly decreasing with system size to at most 2 such events for N = 512. The statistics for 
the larger systems may, indeed, be insufficiant because of the long tail of the distribution (the effect of a reduction of 
the measured r m can clearly be seen in figure 2 for the runs with e = 1CP 2 and 256 respectively 512 spins). However, 
the resulting error should not alter the results significantly. 

Figure 2 shows the increase of r m with the number of the spins in the system on a log-log scale. 
14 i 1 1 1 1 1 1 1 1 1 1 1 
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FIG. 2. Log-log-plot of the mean "tunneling times" r m versus the number of spins, N = 2 n (n = 2, 3, . . . , 9), in the system 
for the infinite-range ferromagnet. The diamonds stem from simulations with e = 1CP 3 , the crosses from ones with e = 1CP 2 , 
and the open squares from ones with e = 10 _1 . 



I have fitted straight lines (by eyesight) to the data points in figure 2, corresponding to the fits r m = c m A f(5m , for 
the different values of e. Because of higher statistical reliability of the results of the smaller size systems these have 
been taken account more than the ones of the larger size systems. The results for the fit parameters are 



ln(c m ) w 0.36 , S m w 2.10 , e = 10~ 3 , 
ln(c m ) w 0.42 , 6 m w 2.07 , e = 10~ 2 , 
ln(c m ) w 0.54 , <5 m w 1.99 , e = 10" 1 . 



(35) 



Remember that the exponent for a random walk in energy space is S = 2 from the general results on one-dimensional 
random walks. 
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In figure 3 the increase of r rmH with the number of the spins in the system is displayed on a log-log scale. 
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FIG. 3. Log-log- plot of the rms value of the "tunneling times" r rms versus the number of spins, N — 2" (y = 2, 3, . . . , 9), in 
the system for the infinite-range ferromagnet. The diamonds stem from simulations with e = 10 -3 , the crosses from ones with 
e = 10 -2 , and the open squares from ones with e = 10~ . 

Here the straight lines corresponding to the fits r rms = c rma iV' 5ims give 
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We see from the figures that the width of the distribution is of the order of its mean, which can be noted easily already 
for N = 1, where one knows the (power-law-like) distribution of the tunneling times and its properties analytically. 

In all of the above simulations, I have compared the obtained values for the entropy with the true ones, from equation 
(|l4|). Depending on the value of e, on the number of times that I averaged over the (estimated instantaneous) entropy 
values and on the total time of the simulation, I got more and less accurate results. In any case, for the runs of 
figures 2 and 3, the error was typically of the order of e. In figure 4, a comparison of the entropy values, obtained for 
a system containing 128 spins, using the algorithm with e = 10~ 2 and running it for a total of 50960 MCS, with the 
exact values is shown. 
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FIG. 4. Comparison of the entropy values, S(m), obtained through the FEMC algorithm and e = 1CP 2 (crosses) with the 
true entropy values, So(m), and comparison of the ones obtained through a random walk in configuration space (diamonds) 
with the true entropy values, for a system with 256 spins. The FEMC algorithm has been run in total for 50960 MCS, the one 
of the random sampling for 1040960 MCS. 



As can be seen from figure 4, the matching of the values (of the number of configurations, not of the entropy !) is 
quite impressive, taking into account the fact that there are 2 configurations. I have also compared it in the figure 
to the values obtained by a run with a larger number of MCS, namely 1040960 MCS, but performing a local random 
walk in configuration space. In this last run, I have counted the configurations during the run and normalised the 
sum of all the hits to 2 128 . It is easily seen from the figure that, in contrast to the FEMC algorithm (with much fewer 
MCS !), ergodicity has been lost, configurations with energy E(m) for m < 33 have not even been sampled ! 

In conclusion, the FEMC algorithm provides for ergodicity of the system in the used computer time, and satisfactory 
estimates of the entropy for e in the range between 10 _1 and 10~ 3 . If one is rather interested in smaller tunneling 
times, one should however run the algorithm with larger e. 



B. The two- and three-dimensional ferromagnets 



In the case of the two-dimensional ferromagnct I have performed runs on systems of linear size L = 2™, where 
n = 1, 2, . . . , 6, and for the three-dimensional ferromagnet on systems of linear size L = 2™, where n= 1, 2, 3, 4. Again, 
I have compared runs where the tunneling times where measured from the beginning with ones where the tunneling 
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times of the first x ■ M 2 (x — 1,2,..., 10) MCS were not taken into account and observed that the distribution of 
the tunneling times, its mean and rms value remained essentially unaltered. However, to be again on the safe side, 
I have not counted the tunnclings observed during the first 10 • M 2 MCS in the runs whose results are displayed 
in the following. In all of the different runs, the histogram of the energies, averaged at the same time than the 
(instantaneous) entropy values, is flat to within the same percentages as in the infinite-range ferromagnet. 

The results shown in figure 5 (for the two-dimensional ferromagnet) have been obtained with a total number of 
10 6 + 10 • N 2 MCS. As these runs were done mainly to test the tunneling ability of the algorithm, I have fixed e to 
equal 10 _1 . This implies that the values r m and r rmB , the mean and the rms value of the tunneling times, have been 
obtained from 106825 values for L = 2 to 14 values for L = 64. The same statistical errors as for the infinite-range 
ferromagnet play a role here. The distributions for the tunneling times, r, themselves are, again, intrinsically very 
broad (their width is of the order of their mean !) with an accordingly long tail. In the runs displayed in the figure 
393 tunneling events occured with times larger than 5 • r m for L = 2 down to 1 for L = 64. The statistics for the 
larger systems may again be insufneiant because of the long tail of the distribution. 

Figure 5 shows the increase of r m and r rms with the volume of the system on a log-log scale. 

1 1 1 1 1 1 1 

$ 

+ 

o 

+ 

o 

+ 

I I I I I I I 

1 23456789 

log(N) 

FIG. 5. Log-log-plot of the mean "tunneling times" r m (crosses) and of their rms value r rm3 (diamonds) versus the volume 
size of the system, V = (2 n ) (n = 1,2, . . . , 6), for the two-dimensional ferromagnet. 

I have fitted straight lines (by eyesight) to the data points in figure 5, corresponding to the fits r m = c^N 6 ™ . The 
results for the fit parameters are 
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ln(c m ) w 0.36 , 5 m « 2.10 , 

(37) 

ln(c rma ) w 0.42 , <5 m w 2.07 . 

We notice again from the figure that the width of the distribution is of the order of its mean. 

The results shown in figure 6 (for the three-dimensional ferromagnet) have also been obtained with a total number 
of 10 6 + 10 • A/" 2 MCS and e = 10 _1 . This implies that the values r m and r rm3 have been obtained from 443 values for 
L = 2 down to 6 values for L = 16. In the runs displayed in the figure, 3 tunneling events occured with times larger 
than 5 • r m for L = 2 down to for L = 16. The statistics for the larger systems may again be insufficiant because 
of the long tail of the distribution. Figure 6 shows the increase of T m and r rmB with the volume of the system on a 
log-log scale. 
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FIG. 6. Log-log-plot of the mean "tunneling times" r m (crosses) and of their rms value r rma (diamonds) versus the volume 
size of the system, V = (2 n ) 3 (n = 1, 2, . . . , 4), for the three-dimensional ferromagnet. 

Finally, I have also determined approximate values of the entropy for the 4x4x4-ferromagnet in order to compare 
them to the ones known exactly |37]]. The resulting comparison can be seen in figure 7. The FEMC simulation 
of a total of 120000 MCS (of which the first 100000 MCS were discarded from the measurements; during the last 
20000 MCS the entropy was averaged over the instanteneous values of every 100th MCS) has been done with fixed 
e = 5-10~ 4 . 
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FIG. 7. Comparison of the ratio of the number of configurations from FEMC simulation/exact values (diamonds) for the 
4x4x4-dimensional ferromagnet from a run with a total of 120000 MCS and of the ratio of the number of configurations from 
a simulation through random walk in configuration space/exact values (crosses) for 10 6 MCS. 



As can be seen from the figure, the matching of the values (of the number of configurations, not of the entropy !) 
is again quite impressive, taking into account the fact that there are 2 configurations. I have also compared it in 
the figure to the values obtained by a run performing a local random walk in configuration space, of even more MCS, 
namely 10 6 MCS. In this latter run, I have counted the configurations sampled during the run and normalised the 
sum of all the hits to 2 64 . It is easily seen from the figure that, in contrast to the FEMC algorithm, ergodicity has 
again been lost, configurations with energies \E\ > 88 have not even been sampled ! 

In conclusion, the FEMC algorithm provides also for the two- and three-dimensional fcrromagnets for ergodicity 
of the system in the used computer time, and satisfactory estimates of the entropy for e in the range between 10 
and 5 • 10~ 4 . If one is rather interested in smaller tunneling times, one should however again run the algorithm with 
larger e. 



IV. CONCLUSION 



Recently Monte Carlo sampling with respect to unconventional ensembles has received some attention [9-34] . Mul- 
ticanonical and related sampling has allowed considerable gains in situations with "supercritical" slowing down, such 
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as first order phase transitions |lj,|l^,g2|,g4[ and systems with conflicting constraints, such as spin glasses |15yi7| , |31[ , |32| 
or proteins ]2^,|3(J. They seem, however, still be haunted by some difficulties, in particular for the latter systems ||. 

For first order phase transitions of non-random systems the problem of the a-priori unknown entropy is rather 
elegantly overcome by means of finite size scaling methods |I^ , p^| , ^2| , p3| , p5|j34| . A sufficiently accurate estimate is 
obtained by extrapolation from the already simulated smaller lattices. The smallest lattices allow still for efficient 
canonical simulations. For the three-dimensional Ising ferromagnet it is clear that the finite size scaling methods 
employed in Jl2|,|l3|,^2| provide reliable estimates of the multicanonical parameters. On the other hand, this approach 
breaks down [l5| ] for the important class of disordered systems. For instance for spin glasses one has to perform 
the additional average over quenched random variables (the coupling constants). Different choices of these random 
variables define different realisations of the (finite-size) system which imply different entropy functions for the different 
realisations. Then algorithms like the one of this paper become crucial, but the Ising ferromagnet is still a suitable 
testing ground to set quantitative performance scales. 

In this paper, I have introduced a new algorithm in the above spirit. I have analytically proven that the entropy of 
a system can be obtained from a Free Energy Monte Carlo algorithm at infinite temperature (/3 = in equation (||)) 
in the infinite-time limit and applied the algorithm to the infinite-range as well as to the two- and three-dimensional 
ferromagnets. I have simulated these systems to obtain the entropy function and to investigate the "tunneling times" 
of getting from one energy minimum to another one. The results are very encouraging: 

(i) The scaling of the "tunneling times" with the system size are almost the ones of a random walk in energy space. 

(ii) The entropy function of the considered systems could be obtained roughly within errors of order e in the used 

computer time. 

(iii) More importantly, ergodicity could be retained in all the considered cases in the used computer time (~ 10 6 
MCS at most). 

In particular, the last fact, the retaining of the ergodicity, should allow for a calculation of physical quantities, such as 
correlation functions, through equation (Q) near zero temperature where conventional MC simulations fail. It is also 
interesting to use the full FEMC algorithm (with nonvanishing (3). More results with the latter shall be published 
elsewhere |pq| . It is particularly interesting to determine the best choice of the parameters e and f3 and of their 
ratio to overcome energy barriers in the least amount of time and to explore the energy space for minima or more 
generally extrema. Also, more analytical details and properties of the algorithm can be obtained for the case of the 
infinite-range ferromagnet J36|. It shall be interesting to apply the algorithm in the cases where hitherto ergodicity 
problems have not allowed to obtain results. 

One last application of the algorithm should be mentioned here. Constructing a general model of on-line learning 
is an important challenge in the theory of learning and its application. A plausible definition of the goal of supervised 
learning from examples is to find a weight vector w that minimises the generalisation error, e 5 (w). A general model 
of batch learning, in which the learner has free access to a fixed set of examples, is based on minimisation of the total 
training error. Indeed, as the size of the training set, P, grows this procedure converges uniformly to the minimum of 
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the generalisation error. In systems with continuously varying weights, the rate of convergence to this limit follows 
generically a power law |p^| , A similar general model does not yet exist for on-line learning, where, at each time 
step, the learner receives a single new example and is unable to store previous examples in memory. The conventional 
on-line algorithm is based on the gradient of the instantaneous error |^,^0|. For a sufficiently small learning rate, it 
converges to a local minimum of e s (w) but not necessarily to the global one. More importantly, it is not applicable 
to learning of boolean functions or of other discrete valued functions which are extremely useful for decision and 
classification tasks. Recently, an on-line Gibbs algorithm has been proposed pdf as the first on-line algorithm that 
guarantees convergence to the minimal generalisation error for non-smooth systems, in particular for systems with 
discrete valued outputs or threshold hidden units. The price that is paid is an increased complexity of the computation 
at each presentation of examples. In particular, for systems in which the generalisation error has local minima, the 
on-line Gibbs algorithm may require a slow annealing schedule of the temperature variable, used in the algorithm, 
which might yield a slow global convergence rate. In addition, the algorithm relies on the possibility of updating the 
weights by small increments. Consequently, it is inapplicable to systems with discrete valued weights. In the case 
of the FEMC algorithm, gradient descent and escaping from minima can be combined naturally (see equation (||)) 
so that the FEMC may offer a valuable alternative to optimise the learning procedure. Serious questions concerning 
on-line learning are still open, most importantly the one is on the global convergence rate, which is yet unknown in 
general. 
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